fcGENE: A Versatile Tool for Processing and Transforming SNP Datasets

نویسندگان

  • Nab Raj Roshyara
  • Markus Scholz
چکیده

BACKGROUND Modern analysis of high-dimensional SNP data requires a number of biometrical and statistical methods such as pre-processing, analysis of population structure, association analysis and genotype imputation. Software used for these purposes often rely on specific and incompatible input and output data formats. Therefore extensive data management including multiple format conversions is necessary during analyses. METHODS In order to support fast and efficient management and bio-statistical quality control of high-dimensional SNP data, we developed the publically available software fcGENE using C++ object-oriented programming language. This software simplifies and automates the use of different existing analysis packages, especially during the workflow of genotype imputations and corresponding analyses. RESULTS fcGENE transforms SNP data and imputation results into different formats required for a large variety of analysis packages such as PLINK, SNPTEST, HAPLOVIEW, EIGENSOFT, GenABEL and tools used for genotype imputation such as MaCH, IMPUTE, BEAGLE and others. Data Management tasks like merging, splitting, extracting SNP and pedigree information can be performed. fcGENE also supports a number of bio-statistical quality control processes and quality based filtering processes at SNP- and sample-wise level. The tool also generates templates of commands required to run specific software packages, especially those required for genotype imputation. We demonstrate the functionality of fcGENE by example workflows of SNP data analyses and provide a comprehensive manual of commands, options and applications. CONCLUSIONS We have developed a user-friendly open-source software fcGENE, which comprehensively supports SNP data management, quality control and analysis workflows. Download statistics and corresponding feedbacks indicate that software is highly recognised and extensively applied by the scientific community.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets

Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...

متن کامل

Building Grid-Based Applications for the Management and Analysis of Neuroimaging Data Sets for the Medical Grid

In this paper, we discuss the application of grid technology in the management and processing of digital medical images, in particular, functional magnetic resonance imaging (fMRI). Functional MRI is a non-invasive technique used to investigate the functions of the human brain. It usually involves the processing of thousands of MR images acquired as individual slices with a spatial resolution o...

متن کامل

FAGI-tr: A Tool for Aligning Geospatial RDF Vocabularies

In this paper, we present FAGI-tr, a tool for aligning RDF vocabularies with respect to their geospatial aspect. The tool provides a framework for (a) loading a source and a target geospatial RDF dataset, (b) identifying vocabularies for representing geospatial RDF data, (c) selecting, from both datasets, the representations to be considered for processing, (d) selecting a target vocabulary and...

متن کامل

Laminar Organization of Cerebral Cortex in Transforming Growth Factor Beta Mutant Mice

Transforming growth factor betas (TGF?s) are one of the most widespread and versatile cytokines. The three mammalian TGF? isoforms, ?1, ?2, and ?3, and their receptors regulate proliferation of neuronal precursors as well as survival and differentiation in neurons of developing and adult nervous system. Functions of TGF?s has a wide spectrum ranging from regulating cell proliferation and differ...

متن کامل

MACGT: multi-dimensional automated clustering genotyping tool for analysis of microarray-based mini-sequencing data

SUMMARY Multi-dimensional Automated Clustering Genotyping Tool (MACGT) is a Java application that clusters complex multi-dimensional vector data derived from single nucleotide polymorphism (SNP) genotyping experiments using mini-sequencing based microarray chemistries such as arrayed primer extension (APEX). Spot intensity output files from microarray experiments across multiple samples are imp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014